KEY VARIABLES OF INTEREST:
Here is the first pass at the plot.
Here’s some text for V2
Here’s some text for V3
Here’s some text for V4
---
title: "Big Cities Health Inventory Data Visualization"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: fill
social: menu
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(rio)
library(colorblindr)
library(janitor)
library(magrittr)
library(ggrepel)
library(fontawesome)
```
# Background {data-orientation=rows data-icon="fa-info-circle"}
Sidebar {.sidebar}
-------------------------------
**Background**
The [Big Cities Health Coalition](https://twitter.com/bigcitieshealth?lang=en) (BCHC) is a large-scale collaboration among 30 of the largest urban health departments in the United States. See the BCHC's [informational brochure](https://static1.squarespace.com/static/534b4cdde4b095a3fb0cae21/t/5c7fc5cd6e9a7f44b5abf311/1551877582500/BCHC_ABOUT+US.pdf) for more details. You can download the complete dataset [here](http://bchi.bigcitieshealth.org/rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBGdz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--c6b5c30fbd8b79859797e1dc260a06064c8f3864/Current%20BCHI%20Platform%20Dataset%20(7-18)%20-%20Updated%20BCHI%20Platform%20Dataset%20-%20BCHI,%20Phase%20I%20&%20II.csv?disposition=attachment), which contains over 30,000 data points on a large variety of health indicators, e.g., behavioral health & substance abuse, chronic disease, environmental health, and life expectancy.
This project includes only a tiny fraction of the available BCHC data, focusing in particular on **obesity rate**, **heart disease mortality rate**, and **opioid-related mortality**. Click on the icons to the right for more information on these variables. The goal of this project is provide three data visualizations using these variables and document different iterations of these visualizations.
Row {data-height=600}
-----------------------------------------------------------------------
### Title {.no-title}
Click the image below to access the BCHC data platform:
[](http://www.bigcitieshealth.org/city-data)
### Title {.no-title}
Cities included in the BCHC. Click the map below for more information on city membership.
[](http://www.bigcitieshealth.org/our-members-big-cities-health-coalition-bchc/)
Row {data-height=90}
-----------------------------------------------------------------------
### Title {.no-title}
**KEY VARIABLES OF INTEREST:**
Row {data-height=300}
-----------------------------------------------------------------------
### Title {.no-title}
*Obesity Rate*
[](http://www.bigcitieshealth.org/obesity-physical-activity)
### Title {.no-title}
*Heart Disease Mortality Rate*
[](https://bchi.bigcitieshealth.org/indicators/1834/searches/22955)
### Title {.no-title}
*Opioid-Related Mortality Rate*
[](http://www.bigcitieshealth.org/combatting-opioids)
```{r import data, warning=FALSE}
data_raw <- import("http://bchi.bigcitieshealth.org/rails/active_storage/blobs/eyJfcmFpbHMiOnsibWVzc2FnZSI6IkJBaHBGdz09IiwiZXhwIjpudWxsLCJwdXIiOiJibG9iX2lkIn19--c6b5c30fbd8b79859797e1dc260a06064c8f3864/Current%20BCHI%20Platform%20Dataset%20(7-18)%20-%20Updated%20BCHI%20Platform%20Dataset%20-%20BCHI,%20Phase%20I%20&%20II.csv?disposition=attachment")
# wrangle data
data_filt <- data_raw %>%
clean_names() %>%
select(shortened_indicator_name, year, sex, race_ethnicity, value, place) %>%
filter(shortened_indicator_name %in% c("Adult Physical Activity Levels", "Teen Physical Activity Levels", "Adult Binge Drinking","Adult Obesity","Heart Disease Mortality Rate","Bike Score","Walkability","Median Household Income","Race/Ethnicity","Death Rate (Overall)")) %>%
mutate(value = as.numeric(value)) %>%
mutate_at(c("sex", "race_ethnicity", "place"), factor) %>%
mutate(place = plyr::mapvalues(x = .$place, from = c("Fort Worth (Tarrant County), TX", "Indianapolis (Marion County), IN", "Las Vegas (Clark County), NV", "Miami (Miami-Dade County), FL", "Oakland (Alameda County), CA", "Portland (Multnomah County), OR"), to = c("Fort Worth, TX", "Indianapolis, IN", "Las Vegas, NV", "Miami, FL", "Oakland, CA", "Portland, OR"))) %>%
na.omit()
```
# Obesity x City {data-icon="fa-weight"}
Sidebar {.sidebar}
-------------------------------
**Plot #1**
This plot represents obesity rates per city averaged across all years in the dataset (2010-2018).
```{r, warning}
# wrangle data
data_obesity <- data_filt %>%
filter(shortened_indicator_name == "Adult Obesity") %>%
spread(shortened_indicator_name, value) %>%
group_by(place) %>%
summarise(avg_obesity = mean(`Adult Obesity`, na.rm = TRUE),
sd_obesity = sd(`Adult Obesity`),
n = n()) %>%
mutate(se_obesity = sd_obesity/(sqrt(n)))
```
Column {data-width=650}
-----------------------------------------------------------------------
### Final plot
```{r}
data_obesity %>%
mutate(compare_us_tot = ifelse(
avg_obesity > .$avg_obesity[which(data_obesity$place == "U.S. Total")], "above",
ifelse(avg_obesity < .$avg_obesity[which(data_obesity$place == "U.S. Total")], "below", "avg"))) %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_col(aes(fill = compare_us_tot), alpha = 0.8) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_fill_manual(values = c("#BA4A00", "black", "#ABCFF7")) +
labs(title = "Percent of Adults Who Are Obese", y = "Percent", x = NULL, caption = "States above the U.S. average are colored red. States below the U.S. average are colored green.") +
theme_minimal() +
geom_hline(yintercept = data_obesity$avg_obesity[which(data_obesity$place == "U.S. Total")], linetype = 2) +
theme(legend.position = "none")
```
Column {.tabset data-width=350}
-----------------------------------------------------------------------
### Version 1
```{r}
data_obesity %>%
ggplot(aes(place, avg_obesity, avg_obesity)) +
geom_col() +
coord_flip()
```
> Here is the first pass at the plot.
### Version 2
```{r}
data_obesity %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
labs(title = "Percent of Adults Who Are Obese", y = "Percent", x = NULL) +
theme_minimal()
```
> Here's some text for V2
### Version 3
```{r}
data_obesity %>%
mutate(compare_us_tot = ifelse(
avg_obesity > .$avg_obesity[which(data_obesity$place == "U.S. Total")], "above",
ifelse(avg_obesity < .$avg_obesity[which(data_obesity$place == "U.S. Total")], "below", "avg"))) %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_segment(aes(color = compare_us_tot, x = fct_reorder(place, avg_obesity), xend = place, y=0, yend = avg_obesity), size = 1, alpha = 0.7) +
geom_point(aes(color = compare_us_tot), size = 3, alpha = 0.7) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_color_manual(values = c("#BA4A00", "black", "#ABCFF7")) +
labs(title = "Percent of Adults Who Are Obese", y = "Percent", x = NULL, caption = "States above the U.S. average are colored red. States below the U.S. average are colored green.") +
theme_minimal() +
geom_hline(yintercept = data_obesity$avg_obesity[which(data_obesity$place == "U.S. Total")], linetype = 2) +
theme(legend.position = "none")
```
> Here's some text for V3
### Version 4
```{r}
data_obesity %>%
mutate(compare_us_tot = ifelse(
avg_obesity > .$avg_obesity[which(data_obesity$place == "U.S. Total")], "above",
ifelse(avg_obesity < .$avg_obesity[which(data_obesity$place == "U.S. Total")], "below", "avg"))) %>%
ggplot(aes(fct_reorder(place, avg_obesity), avg_obesity)) +
geom_errorbar(aes(ymin = avg_obesity - 1.96*se_obesity,
ymax = avg_obesity + 1.96*se_obesity),
color = "gray40") +
geom_point(aes(color = compare_us_tot), size = 4, alpha = 0.7) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(scale = 1)) +
scale_color_manual(values = c("#BA4A00", "black", "#ABCFF7")) +
labs(title = "Percent of Adults Who Are Obese", y = "Percent", x = NULL, caption = "States above the U.S. average are colored red. States below the U.S. average are colored green.") +
theme_minimal() +
geom_hline(yintercept = data_obesity$avg_obesity[which(data_obesity$place == "U.S. Total")], linetype = 2) +
theme(legend.position = "none")
```
> Here's some text for V4
# Heart Disease x Obesity {data-icon="fa-heartbeat"}
Sidebar {.sidebar}
-------------------------------
Second plot text
```{r}
# wrangle data
obesity_hdmr <- data_filt %>%
filter(shortened_indicator_name %in% c("Adult Obesity", "Heart Disease Mortality Rate"),
sex == "Both",
race_ethnicity == "All",
place != "U.S. Total") %>%
mutate(i = row_number()) %>%
spread(shortened_indicator_name, value) %>%
group_by(place) %>%
summarize(avg_obesity = mean(`Adult Obesity`, na.rm = TRUE),
avg_hdmr = mean(`Heart Disease Mortality Rate`, na.rm = TRUE))
```
Column {data-width=650}
-----------------------------------------------------------------------
### Final plot
```{r}
## 3 most obese cities
top_3_obese <- obesity_hdmr %>%
top_n(3, avg_obesity)
## 3 least obese cities
bottom_3_obese <- obesity_hdmr %>%
top_n(-3, avg_obesity)
obesity_hdmr %>%
ggplot(aes(avg_obesity, avg_hdmr)) +
geom_point(size = 5, alpha = 0.7, color = "gray70") +
geom_point(data = top_3_obese, size = 5, color = "#BA4A00", alpha = 0.5) +
geom_point(data = bottom_3_obese, size = 5, color = "#ABCFF7", alpha= 0.5) +
geom_smooth(method = "lm", alpha = 0.2, color = "gray60") +
geom_text_repel(data = top_3_obese, aes(label = place), min.segment.length = 0) +
geom_text_repel(data = bottom_3_obese, aes(label = place), min.segment.length = 0) +
theme_minimal() +
scale_x_continuous(labels = scales::percent_format(scale = 1)) +
labs(x = "Percent Obese", y = "Heart Disease Mortality Rate", title = "Relationship between Obesity and Heart Disease", subtitle = "State labels represent 3 most/least obese states", caption = "3 most/least obese states are colored red/green, respectively. \n Heart Disease Mortality Rate is age-adjusted per 100,000 people.")
```
Column {.tabset data-width=350}
-----------------------------------------------------------------------
### Version 1
```{r}
obesity_hdmr %>%
ggplot(aes(avg_obesity, avg_hdmr)) +
geom_point() +
geom_smooth(method = "lm")
```
### Version 2
```{r}
obesity_hdmr %>%
ggplot(aes(avg_obesity, avg_hdmr)) +
geom_point() +
geom_smooth(method = "lm") +
geom_text_repel(aes(label = place)) +
theme_minimal()
```
# Opioid Deaths x Gender {data-icon="fa-tablets"}
Sidebar {.sidebar}
-------------------------------
Second plot text
```{r}
# wrangle data
data_opioid <- data_raw %>%
clean_names() %>%
select(shortened_indicator_name, year, sex, race_ethnicity, value, place) %>%
filter(shortened_indicator_name %in% c("Opioid-Related Overdose Mortality Rate")) %>%
mutate(value = as.numeric(value)) %>%
mutate_at(c("sex", "race_ethnicity", "place"), factor) %>%
na.omit()
# identify city with highest opioid-related overdose mortality rate from 2010 to 2016
top_opioid = data_opioid %>%
filter(sex == "Both",
race_ethnicity == "All",
place != "U.S. Total",
year %in% 2010:2016) %>%
unique() %>%
spread(shortened_indicator_name, value) %>%
group_by(place) %>%
summarize(mean_opioid = mean(`Opioid-Related Overdose Mortality Rate`, na.rm = TRUE)) %>%
top_n(1) %>%
select(place)
```
Column {data-width=650}
-----------------------------------------------------------------------
### Final plot
```{r}
data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year %in% 2010:2016) %>%
spread(shortened_indicator_name, value) %>%
ggplot(aes(year, `Opioid-Related Overdose Mortality Rate`, color= sex)) +
geom_line(size= 2) +
geom_point(size = 4) +
labs(x = NULL, y = "Opioid-Related Overdose Mortality Rate", title = "Opioid-use Related Mortality Rates Over Time", subtitle = "Colombus, OH", caption = "Rates are age-adjusted per 100,000 people.") +
theme_minimal() +
scale_color_OkabeIto() +
theme(legend.position = "none") +
geom_label(data = data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year == 2016) %>%
spread(shortened_indicator_name, value),
aes(y =`Opioid-Related Overdose Mortality Rate`, label = sex),
nudge_x = -0.7,
size = 5)
```
Column {.tabset data-width=350}
-----------------------------------------------------------------------
### Version 1
```{r}
data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year %in% 2010:2016) %>%
spread(shortened_indicator_name, value) %>%
ggplot(aes(year, `Opioid-Related Overdose Mortality Rate`, color= sex)) +
geom_line()
```
### Version 2
```{r}
data_opioid %>%
filter(sex != "Both",
race_ethnicity == "All",
place == top_opioid$place,
year %in% 2010:2016) %>%
spread(shortened_indicator_name, value) %>%
ggplot(aes(year, `Opioid-Related Overdose Mortality Rate`, color= sex)) +
geom_line(size= 2) +
geom_point(size = 4) +
labs(x = NULL, y = "Opioid-Related Overdose Mortality Rate", title = "Opioid-use Related Mortality Rates Over Time", subtitle = "Colombus, OH", caption = "Rates are age-adjusted per 100,000 people.") +
theme_minimal()
```